Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach

نویسنده

  • Marc Boullé
چکیده

In the domain of data preparation for supervised classification, filter methods for variable ranking are time efficient. However, their intrinsic univariate limitation prevents them from detecting redundancies or constructive interactions between variables. This paper introduces a new method to automatically, rapidly and reliably extract the classificatory information of a pair of input variables. It is based on a simultaneous partitioning of the domains of each input variable, into intervals in the numerical case and into groups of categories in the categorical case. The resulting input data grid allows to quantify the joint information between the two input variables and the output variable. The best joint partitioning is searched by maximizing a Bayesian model selection criterion. Intensive experiments demonstrate the benefits of the approach, especially the significant improvement of accuracy for classification tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Bayesian 2D-Discretization for Variable Ranking in Regression

In supervised machine learning, variable ranking aims at sorting the input variables according to their relevance w.r.t. an output variable. In this paper, we propose a new relevance criterion for variable ranking in a regression problem with a large number of variables. This criterion comes from a discretization of both input and output variables, derived as an extension of a Bayesian non para...

متن کامل

Multivariate Data Grid Models for Supervised and Unsupervised Learning Note technique

This paper introduces a new method to automatically, rapidly and reliably evaluate the class conditional information of any subset of variables in supervised learning. It is based on a partitioning of each input variable, into intervals in the numerical case and into groups of values in the categorical case. The cross-product of the univariate partitions forms a multivariate partition of the in...

متن کامل

Classifier Learning with Supervised Marginal Likelihood

It has been argued that in supervised classification tasks it may be more sensible to perform model selection with respect to a more focused model selection score, like the supervised (conditional) marginal likelihood, than with respect to the standard unsupervised marginal likelihood criterion. However, for most Bayesian network models, computing the supervised marginal likelihood score takes ...

متن کامل

Supervised pre-processing approaches in multiple class variables classification for fish recruitment forecasting

A multi-species approach to fisheries management requires taking into account the interactions between species in order to improve recruitment forecasting of the fish species. Recent advances in Bayesian networks direct the learning of models with several interrelated variables to be forecasted simultaneously. These models are known as multi-dimensional Bayesian network classifiers (MDBNs). Pre...

متن کامل

A Semi-supervised Learning Approach to Object Recognition with Spatial Integration of Local Features and Segmentation Cues

This chapter presents a principled way of formulating models for automatic local feature selection in object class recognition when there is little supervised data. Moreover, it discusses how one could formulate sensible spatial image context models using a conditional random field for integrating local features and segmentation cues (superpixels). By adopting sparse kernel methods and Bayesian...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Adv. Data Analysis and Classification

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2009